Consider models with either size or predation, both, and both with interaction Focus on inferred variation across tanks, and why it changes with different models
# include the 'hide' for the code blocks where you don't want to print results
library(rethinking)
## Loading required package: rstan
## Loading required package: ggplot2
## Loading required package: StanHeaders
## rstan (Version 2.12.1, packaged: 2016-09-11 13:07:50 UTC, GitRev: 85f7a56811da)
## For execution on a local, multicore CPU with excess RAM we recommend calling
## rstan_options(auto_write = TRUE)
## options(mc.cores = parallel::detectCores())
## Loading required package: parallel
## rethinking (Version 1.59)
# get map2stan up and ready
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
data("reedfrogs")
d <- reedfrogs
head(d)
tail(d)
summary(d)
I think I need to make dummy variables first. For pred, set no = 0 and pred = 1 for size, set small = 0 and big = 1
d$pred_d <- ifelse(d$pred == "pred", 1, 0)
d$size_d <- ifelse(d$size == "big", 1, 0)
A quick glance at the data
library(ggplot2)
pl <- ggplot(d, aes(pred, propsurv))
pl + geom_boxplot() + facet_grid(. ~ size) + geom_jitter(width = 0.2)
Hmm, seems like ‘big’ may be a disadvantage for frogs with predators around
To start, will re-run original model, and then start adding to it
# make the tank cluster variable
d$tank <- 1:nrow(d)
head(d)
# fit intercept only model
m12M1.1 <- map2stan(
alist(
surv ~ dbinom( density , p ) ,
logit(p) <- a_tank[tank] ,
a_tank[tank] ~ dnorm( a , sigma ) ,
a ~ dnorm(0,1) ,
sigma ~ dcauchy(0, 1)
),
data=d, iter = 4000, chains = 4 )
## Warning in FUN(X[[i]], ...): data with name pred is not numeric and not
## used
## Warning in FUN(X[[i]], ...): data with name size is not numeric and not
## used
## The following numerical problems occured the indicated number of times after warmup on chain 2
## count
## Exception thrown at line 17: normal_log: Scale parameter is 0, but must be > 0! 1
## When a numerical problem occurs, the Hamiltonian proposal gets rejected.
## See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
## If the number in the 'count' column is small, do not ask about this message on stan-users.
## The following numerical problems occured the indicated number of times after warmup on chain 3
## count
## Exception thrown at line 17: normal_log: Scale parameter is 0, but must be > 0! 1
## When a numerical problem occurs, the Hamiltonian proposal gets rejected.
## See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
## If the number in the 'count' column is small, do not ask about this message on stan-users.
## The following numerical problems occured the indicated number of times after warmup on chain 4
## count
## Exception thrown at line 17: normal_log: Scale parameter is 0, but must be > 0! 1
## When a numerical problem occurs, the Hamiltonian proposal gets rejected.
## See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
## If the number in the 'count' column is small, do not ask about this message on stan-users.
## Warning in FUN(X[[i]], ...): data with name pred is not numeric and not
## used
## Warning in FUN(X[[i]], ...): data with name size is not numeric and not
## used
## Computing WAIC
## Constructing posterior predictions
## Aggregated binomial counts detected. Splitting to 0/1 outcome for WAIC calculation.
plot(m12M1.1)
## Waiting to draw page 2 of 4
## Waiting to draw page 3 of 4
## Waiting to draw page 4 of 4
# intercept plus predation
m12M1.2 <- map2stan(
alist(
surv ~ dbinom( density , p ) ,
logit(p) <- a_tank[tank] + b*pred_d,
a_tank[tank] ~ dnorm( a , sigma ) ,
a ~ dnorm(0,1) ,
sigma ~ dcauchy(0, 1) ,
b ~ dnorm(0,1)
),
data=d, iter = 4000, chains = 4 )
## Warning in FUN(X[[i]], ...): data with name pred is not numeric and not
## used
## Warning in FUN(X[[i]], ...): data with name size is not numeric and not
## used
## The following numerical problems occured the indicated number of times after warmup on chain 1
## count
## Exception thrown at line 20: normal_log: Scale parameter is 0, but must be > 0! 1
## When a numerical problem occurs, the Hamiltonian proposal gets rejected.
## See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
## If the number in the 'count' column is small, do not ask about this message on stan-users.
## The following numerical problems occured the indicated number of times after warmup on chain 2
## count
## Exception thrown at line 20: normal_log: Scale parameter is 0, but must be > 0! 1
## When a numerical problem occurs, the Hamiltonian proposal gets rejected.
## See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
## If the number in the 'count' column is small, do not ask about this message on stan-users.
## Warning in FUN(X[[i]], ...): data with name pred is not numeric and not
## used
## Warning in FUN(X[[i]], ...): data with name size is not numeric and not
## used
## Computing WAIC
## Constructing posterior predictions
## Aggregated binomial counts detected. Splitting to 0/1 outcome for WAIC calculation.
plot(m12M1.2)
## Waiting to draw page 2 of 4
## Waiting to draw page 3 of 4
## Waiting to draw page 4 of 4
# intercept plus size
m12M1.3 <- map2stan(
alist(
surv ~ dbinom( density , p ) ,
logit(p) <- a_tank[tank] + b*size_d,
a_tank[tank] ~ dnorm( a , sigma ) ,
a ~ dnorm(0,1) ,
sigma ~ dcauchy(0, 1) ,
b ~ dnorm(0,1)
),
data=d, iter = 4000, chains = 4 )
## Warning in FUN(X[[i]], ...): data with name pred is not numeric and not
## used
## Warning in FUN(X[[i]], ...): data with name size is not numeric and not
## used
## The following numerical problems occured the indicated number of times after warmup on chain 1
## count
## Exception thrown at line 20: normal_log: Scale parameter is 0, but must be > 0! 1
## When a numerical problem occurs, the Hamiltonian proposal gets rejected.
## See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
## If the number in the 'count' column is small, do not ask about this message on stan-users.
## The following numerical problems occured the indicated number of times after warmup on chain 2
## count
## Exception thrown at line 20: normal_log: Scale parameter is 0, but must be > 0! 1
## When a numerical problem occurs, the Hamiltonian proposal gets rejected.
## See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
## If the number in the 'count' column is small, do not ask about this message on stan-users.
## Warning in FUN(X[[i]], ...): data with name pred is not numeric and not
## used
## Warning in FUN(X[[i]], ...): data with name size is not numeric and not
## used
## Computing WAIC
## Constructing posterior predictions
## Aggregated binomial counts detected. Splitting to 0/1 outcome for WAIC calculation.
plot(m12M1.3)
## Waiting to draw page 2 of 4
## Waiting to draw page 3 of 4
## Waiting to draw page 4 of 4
# intercept plus size and predation
m12M1.4 <- map2stan(
alist(
surv ~ dbinom( density , p ) ,
logit(p) <- a_tank[tank] + b_p*pred_d + b_s*size_d ,
a_tank[tank] ~ dnorm( a , sigma ) ,
a ~ dnorm(0,1) ,
sigma ~ dcauchy(0, 1) ,
c(b_p, b_s) ~ dnorm(0,1)
),
data=d, iter = 4000, chains = 4 )
## Warning in FUN(X[[i]], ...): data with name pred is not numeric and not
## used
## Warning in FUN(X[[i]], ...): data with name size is not numeric and not
## used
## The following numerical problems occured the indicated number of times after warmup on chain 1
## count
## Exception thrown at line 23: normal_log: Scale parameter is 0, but must be > 0! 2
## When a numerical problem occurs, the Hamiltonian proposal gets rejected.
## See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
## If the number in the 'count' column is small, do not ask about this message on stan-users.
## The following numerical problems occured the indicated number of times after warmup on chain 3
## count
## Exception thrown at line 23: normal_log: Scale parameter is 0, but must be > 0! 2
## When a numerical problem occurs, the Hamiltonian proposal gets rejected.
## See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
## If the number in the 'count' column is small, do not ask about this message on stan-users.
## The following numerical problems occured the indicated number of times after warmup on chain 4
## count
## Exception thrown at line 23: normal_log: Scale parameter is 0, but must be > 0! 2
## When a numerical problem occurs, the Hamiltonian proposal gets rejected.
## See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
## If the number in the 'count' column is small, do not ask about this message on stan-users.
## Warning in FUN(X[[i]], ...): data with name pred is not numeric and not
## used
## Warning in FUN(X[[i]], ...): data with name size is not numeric and not
## used
## Computing WAIC
## Constructing posterior predictions
## Aggregated binomial counts detected. Splitting to 0/1 outcome for WAIC calculation.
plot(m12M1.4)
## Waiting to draw page 2 of 4
## Waiting to draw page 3 of 4
## Waiting to draw page 4 of 4
# intercept plus size and predation and their interaction
m12M1.5 <- map2stan(
alist(
surv ~ dbinom( density , p ) ,
logit(p) <- a_tank[tank] + b_p*pred_d + b_s*size_d + b_ps*pred_d*size_d,
a_tank[tank] ~ dnorm( a , sigma ) ,
a ~ dnorm(0,1) ,
sigma ~ dcauchy(0, 1) ,
c(b_p, b_s, b_ps) ~ dnorm(0,1)
),
data=d, iter = 4000, chains = 4 )
## Warning in FUN(X[[i]], ...): data with name pred is not numeric and not
## used
## Warning in FUN(X[[i]], ...): data with name size is not numeric and not
## used
## The following numerical problems occured the indicated number of times after warmup on chain 1
## count
## Exception thrown at line 25: normal_log: Scale parameter is 0, but must be > 0! 1
## When a numerical problem occurs, the Hamiltonian proposal gets rejected.
## See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
## If the number in the 'count' column is small, do not ask about this message on stan-users.
## The following numerical problems occured the indicated number of times after warmup on chain 4
## count
## Exception thrown at line 25: normal_log: Scale parameter is 0, but must be > 0! 1
## When a numerical problem occurs, the Hamiltonian proposal gets rejected.
## See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
## If the number in the 'count' column is small, do not ask about this message on stan-users.
## Warning in FUN(X[[i]], ...): data with name pred is not numeric and not
## used
## Warning in FUN(X[[i]], ...): data with name size is not numeric and not
## used
## Computing WAIC
## Constructing posterior predictions
## Aggregated binomial counts detected. Splitting to 0/1 outcome for WAIC calculation.
plot(m12M1.5)
## Waiting to draw page 2 of 4
## Waiting to draw page 3 of 4
## Waiting to draw page 4 of 4
OK, those seemed to run OK.
Now how to focus on inferred variation across tanks? I think that would be the ‘sigma’ value for the global intercept (?)
# precis(m12M1.1, depth =2)[sigma] # fails - object not subsettable
precis(m12M1.1, depth =1) # sigma is 1.62 (intercept only)
## 48 vector or matrix parameters omitted in display. Use depth=2 to show them.
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## a 1.30 0.25 0.90 1.68 8000 1
## sigma 1.62 0.22 1.26 1.94 4996 1
precis(m12M1.2, depth =1) # sigma is 0.85 (plus predation)
## 48 vector or matrix parameters omitted in display. Use depth=2 to show them.
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## a 2.43 0.22 2.07 2.79 1485 1.00
## sigma 0.84 0.15 0.61 1.06 2647 1.00
## b -2.31 0.29 -2.74 -1.82 945 1.01
precis(m12M1.3, depth =1) # sigma is 1.62 (plus size)
## 48 vector or matrix parameters omitted in display. Use depth=2 to show them.
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## a 1.41 0.33 0.87 1.91 1804 1.00
## sigma 1.62 0.22 1.28 1.96 5051 1.00
## b -0.23 0.45 -0.98 0.48 817 1.01
precis(m12M1.4, depth =1) # sigma is 0.79 (plus size plus predation)
## 48 vector or matrix parameters omitted in display. Use depth=2 to show them.
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## a 2.61 0.26 2.18 3.01 1131 1
## sigma 0.80 0.14 0.59 1.03 2638 1
## b_p -2.33 0.29 -2.76 -1.84 1263 1
## b_s -0.35 0.28 -0.82 0.08 1978 1
precis(m12M1.5, depth =1) # sigma is 0.74 (plus predation and size and interaction)
## 48 vector or matrix parameters omitted in display. Use depth=2 to show them.
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## a 2.32 0.27 1.88 2.74 775 1.01
## sigma 0.74 0.15 0.51 0.96 1735 1.00
## b_p -1.80 0.34 -2.35 -1.26 849 1.01
## b_s 0.29 0.38 -0.31 0.89 1226 1.00
## b_ps -1.18 0.47 -1.97 -0.45 1642 1.00
I believe this tells me that +/- predation explains a great deal of the variation between tanks.
Can you reconcile the differences in WAIC with posterior distributions of model?
compare(m12M1.1, m12M1.2, m12M1.3, m12M1.4, m12M1.5)
## WAIC pWAIC dWAIC weight SE dSE
## m12M1.2 1000.0 28.6 0.0 0.39 36.84 NA
## m12M1.4 1000.2 28.2 0.2 0.34 36.92 1.42
## m12M1.5 1000.8 27.5 0.8 0.26 37.02 2.76
## m12M1.1 1009.2 37.6 9.3 0.00 37.96 6.02
## m12M1.3 1009.7 37.8 9.7 0.00 38.03 6.13
# models 2, 4, 5 seem pretty close (model 2 a bit better; since it is simpler, we should favor it)
# while models 1 and 3 are negligible
precis(m12M1.1)
## 48 vector or matrix parameters omitted in display. Use depth=2 to show them.
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## a 1.30 0.25 0.90 1.68 8000 1
## sigma 1.62 0.22 1.26 1.94 4996 1
precis(m12M1.2)
## 48 vector or matrix parameters omitted in display. Use depth=2 to show them.
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## a 2.43 0.22 2.07 2.79 1485 1.00
## sigma 0.84 0.15 0.61 1.06 2647 1.00
## b -2.31 0.29 -2.74 -1.82 945 1.01
precis(m12M1.3)
## 48 vector or matrix parameters omitted in display. Use depth=2 to show them.
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## a 1.41 0.33 0.87 1.91 1804 1.00
## sigma 1.62 0.22 1.28 1.96 5051 1.00
## b -0.23 0.45 -0.98 0.48 817 1.01
precis(m12M1.4)
## 48 vector or matrix parameters omitted in display. Use depth=2 to show them.
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## a 2.61 0.26 2.18 3.01 1131 1
## sigma 0.80 0.14 0.59 1.03 2638 1
## b_p -2.33 0.29 -2.76 -1.84 1263 1
## b_s -0.35 0.28 -0.82 0.08 1978 1
precis(m12M1.5)
## 48 vector or matrix parameters omitted in display. Use depth=2 to show them.
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## a 2.32 0.27 1.88 2.74 775 1.01
## sigma 0.74 0.15 0.51 0.96 1735 1.00
## b_p -1.80 0.34 -2.35 -1.26 849 1.01
## b_s 0.29 0.38 -0.31 0.89 1226 1.00
## b_ps -1.18 0.47 -1.97 -0.45 1642 1.00
# comparing models 2,4 and 5, stdev for a is lowest for model 2
# compare WAIC
(m12M1.models <- compare(m12M1.1, m12M1.2, m12M1.3, m12M1.4, m12M1.5))
## WAIC pWAIC dWAIC weight SE dSE
## m12M1.2 1000.0 28.6 0.0 0.39 36.84 NA
## m12M1.4 1000.2 28.2 0.2 0.34 36.92 1.42
## m12M1.5 1000.8 27.5 0.8 0.26 37.02 2.76
## m12M1.1 1009.2 37.6 9.3 0.00 37.96 6.02
## m12M1.3 1009.7 37.8 9.7 0.00 38.03 6.13
par(mfrow=c(1,1))
plot(m12M1.models, SE = T, dSE = T)
What if I wanted to draw samples from the posteriors and compare to real data?
pred.m12M1.1 <- link(m12M1.1, data = d)
dim(pred.m12M1.1) #1000 rows, 48 colums
head(pred.m12M1.1)
# compute median intercept per tank
d$propsurv_est1 <- apply(pred.m12M1.1, 2, median)
pred.m12M1.2 <- link(m12M1.2, data = d)
# compute median intercept per tank
d$propsurv_est2 <- apply(pred.m12M1.2, 2, median)
pred.m12M1.3 <- link(m12M1.3, data = d)
# compute median intercept per tank
d$propsurv_est3 <- apply(pred.m12M1.3, 2, median)
pred.m12M1.4 <- link(m12M1.4, data = d)
# compute median intercept per tank
d$propsurv_est4 <- apply(pred.m12M1.4, 2, median)
pred.m12M1.5 <- link(m12M1.5, data = d)
# compute median intercept per tank
d$propsurv_est5 <- apply(pred.m12M1.5, 2, median)
OK, now how to easily plot the real data by model? what if I use ggplot and I facet by predation? I will need to put plot over plot
library(ggplot2)
# first, the actual data
pl.real <- ggplot(d, aes(tank, propsurv))
pl.real <- pl.real + geom_point(size=2, colour="blue")
# for model 1 (intercept only; not plotted exactly as in book due to predation facet grid)
p.real.mod1 <- ggplot() +
geom_point(data = d, colour = "blue", aes(x =tank, y =propsurv )) + #blue for real data
geom_point(data = d, colour = "red", aes(x =tank, y =propsurv_est1)) + # red for model
facet_grid(.~pred)
p.real.mod1
p.real.mod2 <- ggplot() +
geom_point(data = d, colour = "blue", aes(x =tank, y =propsurv )) + #blue for real data
geom_point(data = d, colour = "red", aes(x =tank, y =propsurv_est2)) + # red for model
facet_grid(.~pred)
p.real.mod2
# looks better
p.real.mod3 <- ggplot() +
geom_point(data = d, colour = "blue", aes(x =tank, y =propsurv )) + #blue for real data
geom_point(data = d, colour = "red", aes(x =tank, y =propsurv_est3)) + # red for model
facet_grid(.~pred)
p.real.mod3
# not great; about like #1
p.real.mod4 <- ggplot() +
geom_point(data = d, colour = "blue", aes(x =tank, y =propsurv )) + #blue for real data
geom_point(data = d, colour = "red", aes(x =tank, y =propsurv_est4)) + # red for model
facet_grid(.~pred)
p.real.mod4
# pretty good
p.real.mod2and4 <- ggplot() +
geom_jitter(data = d, colour = "blue", aes(x =tank, y =propsurv )) + #blue for real data
geom_jitter(data = d, colour = "red", aes(x =tank, y =propsurv_est2)) + # red for model 2
geom_jitter(data = d, colour = "green", aes(x =tank, y =propsurv_est4)) + # green for model 4
facet_grid(.~pred)
p.real.mod2and4
# can see models 2 and 4 are quite similar
p.real.mod1and2 <- ggplot() +
geom_jitter(data = d, colour = "blue", aes(x =tank, y =propsurv )) + #blue for real data
geom_jitter(data = d, colour = "red", aes(x =tank, y =propsurv_est1)) + # red for model 1
geom_jitter(data = d, colour = "green", aes(x =tank, y =propsurv_est2)) + # green for model 2
facet_grid(.~pred)
p.real.mod1and2
Frog survival model; comparing Gaussian and Cauchy distributions for the varying intercepts
First, the model from the book with Gaussian distributions for intercepts
library(rethinking)
data("reedfrogs")
??reedfrogs
d <- reedfrogs
str(d)
# get map2stan up and ready
library(rstan)
rstan_options(auto_write = TRUE)
options(mc.cores = parallel::detectCores())
# make the tank cluster variable
d$tank <- 1:nrow(d)
head(d)
m12.M3.1 <- map2stan(
alist(
surv ~ dbinom( density , p ) ,
logit(p) <- a_tank[tank] ,
a_tank[tank] ~ dnorm( a , sigma ) ,
a ~ dnorm(0,1) ,
sigma ~ dcauchy(0,1)
), data=d , iter=4000 , chains=4 )
## Warning in FUN(X[[i]], ...): data with name pred is not numeric and not
## used
## Warning in FUN(X[[i]], ...): data with name size is not numeric and not
## used
## The following numerical problems occured the indicated number of times after warmup on chain 2
## count
## Exception thrown at line 17: normal_log: Scale parameter is 0, but must be > 0! 1
## When a numerical problem occurs, the Hamiltonian proposal gets rejected.
## See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
## If the number in the 'count' column is small, do not ask about this message on stan-users.
## Warning in FUN(X[[i]], ...): data with name pred is not numeric and not
## used
## Warning in FUN(X[[i]], ...): data with name size is not numeric and not
## used
## Computing WAIC
## Constructing posterior predictions
## Aggregated binomial counts detected. Splitting to 0/1 outcome for WAIC calculation.
Next, the same but Cauchy intstead of normal
p 260: The Cauchy distributions in the model definitions are implicitly half-Cauchy, a Cauchy defined over the positive reals only. This is because they are applied to a parameter, usually sigma , that is strictly positive. Stan figures out that you meant for it to be half-Cauchy. SO I guess I just won’t specify??
m12.M3.2 <- map2stan(
alist(
surv ~ dbinom( density , p ) ,
logit(p) <- a_tank[tank] ,
a_tank[tank] ~ dcauchy( a , sigma ) ,
a ~ dnorm(0,1) ,
sigma ~ dcauchy(0,1)
), data=d , iter=4000 , chains=4 )
## Warning in FUN(X[[i]], ...): data with name pred is not numeric and not
## used
## Warning in FUN(X[[i]], ...): data with name size is not numeric and not
## used
## Warning in FUN(X[[i]], ...): data with name pred is not numeric and not
## used
## Warning in FUN(X[[i]], ...): data with name size is not numeric and not
## used
## Computing WAIC
## Constructing posterior predictions
## Aggregated binomial counts detected. Splitting to 0/1 outcome for WAIC calculation.
Compare the posterior means of the intercepts, a-tank, to the posterior means produced in the chapter, using the customary Gaussian prior. Can you explain the pattern of differences?
precis(m12.M3.2,depth=2) # depth=2 displays varying effects
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## a_tank[1] 2.02 0.86 0.69 3.26 8000 1.00
## a_tank[2] 6.45 11.93 0.47 11.33 700 1.00
## a_tank[3] 1.09 0.61 0.11 2.04 8000 1.00
## a_tank[4] 13.68 37.63 0.64 15.87 51 1.08
## a_tank[5] 1.99 0.83 0.72 3.21 8000 1.00
## a_tank[6] 2.01 0.85 0.75 3.24 4249 1.00
## a_tank[7] 6.39 10.36 0.78 11.39 407 1.01
## a_tank[8] 2.01 0.85 0.75 3.31 8000 1.00
## a_tank[9] -0.08 0.66 -1.11 0.99 8000 1.00
## a_tank[10] 2.02 0.87 0.67 3.23 4921 1.00
## a_tank[11] 1.10 0.63 0.06 2.07 8000 1.00
## a_tank[12] 0.74 0.61 -0.18 1.73 8000 1.00
## a_tank[13] 1.10 0.62 0.09 2.07 8000 1.00
## a_tank[14] 0.34 0.64 -0.68 1.35 8000 1.00
## a_tank[15] 2.03 0.88 0.74 3.34 5499 1.00
## a_tank[16] 2.02 0.85 0.68 3.23 5399 1.00
## a_tank[17] 2.85 0.93 1.53 4.23 5400 1.00
## a_tank[18] 2.27 0.66 1.23 3.26 8000 1.00
## a_tank[19] 1.92 0.55 1.05 2.74 8000 1.00
## a_tank[20] 35.11 126.02 1.23 37.32 63 1.03
## a_tank[21] 2.27 0.66 1.25 3.24 8000 1.00
## a_tank[22] 2.27 0.67 1.27 3.28 8000 1.00
## a_tank[23] 2.26 0.65 1.24 3.23 8000 1.00
## a_tank[24] 1.66 0.49 0.85 2.41 8000 1.00
## a_tank[25] -1.06 0.48 -1.79 -0.26 8000 1.00
## a_tank[26] 0.24 0.40 -0.39 0.89 8000 1.00
## a_tank[27] -1.59 0.55 -2.44 -0.72 8000 1.00
## a_tank[28] -0.45 0.42 -1.11 0.23 8000 1.00
## a_tank[29] 0.23 0.42 -0.46 0.88 8000 1.00
## a_tank[30] 1.44 0.45 0.70 2.11 8000 1.00
## a_tank[31] -0.64 0.44 -1.32 0.07 8000 1.00
## a_tank[32] -0.28 0.42 -0.95 0.38 8000 1.00
## a_tank[33] 3.26 0.99 1.80 4.70 5295 1.00
## a_tank[34] 2.60 0.67 1.57 3.63 8000 1.00
## a_tank[35] 2.60 0.67 1.54 3.58 8000 1.00
## a_tank[36] 1.98 0.49 1.21 2.73 8000 1.00
## a_tank[37] 1.97 0.48 1.19 2.67 8000 1.00
## a_tank[38] 15.03 35.52 1.33 24.22 80 1.05
## a_tank[39] 2.60 0.68 1.52 3.61 8000 1.00
## a_tank[40] 2.24 0.56 1.39 3.12 8000 1.00
## a_tank[41] -2.00 0.53 -2.81 -1.15 8000 1.00
## a_tank[42] -0.57 0.36 -1.14 0.02 8000 1.00
## a_tank[43] -0.44 0.36 -1.02 0.13 8000 1.00
## a_tank[44] -0.31 0.35 -0.86 0.25 8000 1.00
## a_tank[45] 0.64 0.35 0.10 1.20 8000 1.00
## a_tank[46] -0.57 0.36 -1.15 0.01 8000 1.00
## a_tank[47] 1.97 0.48 1.19 2.71 8000 1.00
## a_tank[48] 0.04 0.34 -0.51 0.59 8000 1.00
## a 1.40 0.29 0.96 1.90 5275 1.00
## sigma 1.03 0.24 0.64 1.38 5077 1.00
plot(precis(m12.M3.1, depth =2))
plot(precis(m12.M3.2,depth=2)) # also plot
par(mfrow = c(1,2))
plot(precis(m12.M3.1, depth =2)) # gaussian
plot(precis(m12.M3.2,depth=2)) # Cauchy
Overall, much lower SD with cauchy (although some are really uncertain, like 2, 7, 20, 38) These large SD tanks had 100% survival.
“at any moment in a Cauchy sampling process, you are able to draw an extreme value that overwhelms all the previous draws” I guess that is what is going on - with a high prop survival, can get very wrong estimate.
Julin: used extract.samples to get posterior estimates and then plotted. Better, I think.
Fit multilevel model to the chimp data
library(rethinking)
data(chimpanzees)
d <- chimpanzees
d$recipient <- NULL # this gets rid of NA
head(d)
m12M4.ch <- map2stan(
alist(
pulled_left ~ dbinom(1, p) ,
logit(p) <- a + a_actor[actor] + (bp + bpC*condition)*prosoc_left ,
a_actor[actor] ~ dnorm(0, sigma_actor) ,
a ~ dnorm(0,10) ,
bp ~ dnorm(0,10) ,
bpC ~ dnorm(0,10),
sigma_actor ~ dcauchy(0, 1)
),
data = d, warmup = 1000, iter = 5000, chains = 4, cores = 3)
## Computing WAIC
## Constructing posterior predictions
precis(m12M4.ch)
## 7 vector or matrix parameters omitted in display. Use depth=2 to show them.
# similar to above, but block now added
# prep data, as block is reserved by Stan
d$block_id <- d$block # name 'block' is reserved by Stan
head(d)
m12M4.block <- map2stan(
alist(
pulled_left ~ dbinom(1, p) ,
logit(p) <- a_actor[actor] + a_block[block_id] + (bp + bpC*condition)*prosoc_left ,
a_actor[actor] ~ dnorm(alpha, sigma_actor) ,
a_block[block_id] ~ dnorm(gamma, sigma_block) ,
c(alpha,gamma,bp, bpC) ~ dnorm(0,10) ,
c(sigma_actor, sigma_block) ~ dcauchy(0, 1)
),
data = d, warmup = 1000, iter = 5000, chains = 4, cores = 3)
## Warning: There were 4128 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems
## Computing WAIC
## Constructing posterior predictions
## Warning in map2stan(alist(pulled_left ~ dbinom(1, p), logit(p) <- a_actor[actor] + : There were 4128 divergent iterations during sampling.
## Check the chains (trace plots, n_eff, Rhat) carefully to ensure they are valid.
really didn’ tlike the above model compare the posterior distributions of the above
precis(m12M4.block) # Rhat values not 1! n_eff low for alpha and gamma
## Warning in precis(m12M4.block): There were 4128 divergent iterations during sampling.
## Check the chains (trace plots, n_eff, Rhat) carefully to ensure they are valid.
## 13 vector or matrix parameters omitted in display. Use depth=2 to show them.
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## alpha 0.18 6.92 -10.37 11.47 444 1.00
## gamma 0.18 6.88 -10.92 10.89 494 1.01
## bp 0.81 0.26 0.41 1.22 158 1.01
## bpC -0.10 0.30 -0.57 0.35 74 1.04
## sigma_actor 2.27 1.05 1.07 3.39 146 1.02
## sigma_block 0.29 0.17 0.10 0.48 124 1.04
# and look at those huge intervals for alpha and gamma
precis(m12M4.ch)
## 7 vector or matrix parameters omitted in display. Use depth=2 to show them.
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## a 0.44 0.93 -0.98 1.90 2372 1
## bp 0.82 0.25 0.42 1.23 8174 1
## bpC -0.13 0.29 -0.59 0.34 8170 1
## sigma_actor 2.26 0.90 1.10 3.41 3768 1
compare(m12M4.ch, m12M4.block)
## WAIC pWAIC dWAIC weight SE dSE
## m12M4.ch 531.0 7.9 0.0 0.67 19.46 NA
## m12M4.block 532.4 10.7 1.4 0.33 19.68 2.36
I think problem has to do with the fact that htere is not a grand mean a parameter (p 373) in my second model.
Bengali women, in district, use.contraception, and urban
# get data, sort out district variable
library(rethinking)
data("bangladesh")
d <- bangladesh
head(d)
## woman district use.contraception living.children age.centered urban
## 1 1 1 0 4 18.4400 1
## 2 2 1 0 1 -5.5599 1
## 3 3 1 0 3 1.4400 1
## 4 4 1 0 4 8.4400 1
## 5 5 1 0 1 -13.5590 1
## 6 6 1 0 1 -11.5600 1
sort(unique(d$district)) # need to make ths good index varaible (continuous)
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## [47] 47 48 49 50 51 52 53 55 56 57 58 59 60 61
d$district_id <- as.integer(as.factor(d$district))
sort(unique(d$district_id)) # ok
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
## [24] 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
## [47] 47 48 49 50 51 52 53 54 55 56 57 58 59 60
# and get ride of . in the colnames
colnames(d) <- sub(".","_",colnames(d),fixed=TRUE)
Now I want to predict contraception use clstered by district_id
traditional fixed-effect model using dummy variables for district first
m12H1.1 <- map2stan(alist(
use_contraception ~ dbinom(1,p),
logit(p) <- a[district_id],
a[district_id] ~ dnorm(0,5)),
data=d,
chains = 4)
##
## SAMPLING FOR MODEL 'use_contraception ~ dbinom(1, p)' NOW (CHAIN 1).
##
## Chain 1, Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1, Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1, Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1, Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1, Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1, Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1, Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1, Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1, Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1, Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1, Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1, Iteration: 2000 / 2000 [100%] (Sampling)
## Elapsed Time: 1.90859 seconds (Warm-up)
## 1.70597 seconds (Sampling)
## 3.61456 seconds (Total)
##
##
## SAMPLING FOR MODEL 'use_contraception ~ dbinom(1, p)' NOW (CHAIN 2).
##
## Chain 2, Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2, Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2, Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2, Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2, Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2, Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2, Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2, Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2, Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2, Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2, Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2, Iteration: 2000 / 2000 [100%] (Sampling)
## Elapsed Time: 1.90374 seconds (Warm-up)
## 1.41727 seconds (Sampling)
## 3.32101 seconds (Total)
##
##
## SAMPLING FOR MODEL 'use_contraception ~ dbinom(1, p)' NOW (CHAIN 3).
##
## Chain 3, Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3, Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3, Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3, Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3, Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3, Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3, Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3, Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3, Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3, Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3, Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3, Iteration: 2000 / 2000 [100%] (Sampling)
## Elapsed Time: 1.96567 seconds (Warm-up)
## 1.92117 seconds (Sampling)
## 3.88684 seconds (Total)
##
##
## SAMPLING FOR MODEL 'use_contraception ~ dbinom(1, p)' NOW (CHAIN 4).
##
## Chain 4, Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4, Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4, Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4, Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4, Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4, Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4, Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4, Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4, Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4, Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4, Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4, Iteration: 2000 / 2000 [100%] (Sampling)
## Elapsed Time: 2.04219 seconds (Warm-up)
## 1.51078 seconds (Sampling)
## 3.55297 seconds (Total)
##
##
## SAMPLING FOR MODEL 'use_contraception ~ dbinom(1, p)' NOW (CHAIN 1).
## WARNING: No variance estimation is
## performed for num_warmup < 20
##
##
## Chain 1, Iteration: 1 / 1 [100%] (Sampling)
## Elapsed Time: 3e-06 seconds (Warm-up)
## 0.000569 seconds (Sampling)
## 0.000572 seconds (Total)
## Computing WAIC
## Constructing posterior predictions
## [ 400 / 4000 ]
[ 800 / 4000 ]
[ 1200 / 4000 ]
[ 1600 / 4000 ]
[ 2000 / 4000 ]
[ 2400 / 4000 ]
[ 2800 / 4000 ]
[ 3200 / 4000 ]
[ 3600 / 4000 ]
[ 4000 / 4000 ]
plot(m12H1.1)
## Waiting to draw page 2 of 4
## Waiting to draw page 3 of 4
## Waiting to draw page 4 of 4
precis(m12H1.1, depth=2)
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## a[1] -1.08 0.21 -1.39 -0.72 4000 1
## a[2] -0.63 0.46 -1.38 0.08 4000 1
## a[3] 4.38 3.01 0.21 9.21 2724 1
## a[4] 0.00 0.37 -0.58 0.60 4000 1
## a[5] -0.59 0.34 -1.17 -0.09 4000 1
## a[6] -0.90 0.27 -1.33 -0.47 4000 1
## a[7] -1.00 0.54 -1.87 -0.17 4000 1
## a[8] -0.50 0.34 -1.03 0.06 4000 1
## a[9] -0.87 0.48 -1.65 -0.10 4000 1
## a[10] -2.83 1.17 -4.68 -1.13 4000 1
## a[11] -6.19 2.56 -9.76 -2.36 2468 1
## a[12] -0.66 0.40 -1.25 -0.03 4000 1
## a[13] -0.35 0.41 -1.02 0.28 4000 1
## a[14] 0.52 0.19 0.23 0.84 4000 1
## a[15] -0.59 0.45 -1.28 0.13 4000 1
## a[16] 0.21 0.45 -0.47 0.96 4000 1
## a[17] -0.92 0.47 -1.62 -0.15 4000 1
## a[18] -0.67 0.31 -1.14 -0.18 4000 1
## a[19] -0.49 0.42 -1.14 0.21 4000 1
## a[20] -0.43 0.54 -1.32 0.38 4000 1
## a[21] -0.47 0.50 -1.27 0.32 4000 1
## a[22] -1.45 0.56 -2.28 -0.52 4000 1
## a[23] -1.07 0.61 -2.07 -0.16 4000 1
## a[24] -2.86 1.11 -4.35 -1.01 3308 1
## a[25] -0.21 0.25 -0.61 0.19 4000 1
## a[26] -0.51 0.59 -1.48 0.40 4000 1
## a[27] -1.55 0.39 -2.17 -0.93 4000 1
## a[28] -1.15 0.34 -1.68 -0.61 4000 1
## a[29] -0.96 0.40 -1.59 -0.32 4000 1
## a[30] -0.03 0.25 -0.45 0.36 4000 1
## a[31] -0.18 0.35 -0.72 0.39 4000 1
## a[32] -1.39 0.52 -2.20 -0.57 4000 1
## a[33] -0.31 0.56 -1.24 0.56 4000 1
## a[34] 0.67 0.36 0.12 1.25 4000 1
## a[35] 0.00 0.29 -0.44 0.45 4000 1
## a[36] -0.64 0.55 -1.50 0.24 4000 1
## a[37] 0.16 0.58 -0.79 1.05 4000 1
## a[38] -1.00 0.64 -2.01 0.00 4000 1
## a[39] 0.00 0.40 -0.63 0.63 4000 1
## a[40] -0.15 0.31 -0.62 0.39 4000 1
## a[41] 0.00 0.40 -0.64 0.63 4000 1
## a[42] 0.19 0.63 -0.87 1.10 4000 1
## a[43] 0.14 0.30 -0.35 0.61 4000 1
## a[44] -1.31 0.49 -2.13 -0.59 4000 1
## a[45] -0.71 0.34 -1.23 -0.14 4000 1
## a[46] 0.09 0.22 -0.26 0.45 4000 1
## a[47] -0.14 0.53 -0.96 0.73 4000 1
## a[48] 0.09 0.31 -0.38 0.60 4000 1
## a[49] -5.02 2.89 -9.20 -0.77 2925 1
## a[50] -0.11 0.48 -0.89 0.64 4000 1
## a[51] -0.17 0.33 -0.74 0.32 4000 1
## a[52] -0.23 0.26 -0.65 0.17 4000 1
## a[53] -0.33 0.48 -1.10 0.39 4000 1
## a[54] -1.90 1.22 -3.74 0.00 4000 1
## a[55] 0.32 0.31 -0.15 0.82 4000 1
## a[56] -1.55 0.53 -2.39 -0.75 4000 1
## a[57] -0.19 0.35 -0.76 0.34 4000 1
## a[58] -2.51 1.16 -4.24 -0.73 2972 1
## a[59] -1.31 0.45 -2.04 -0.63 4000 1
## a[60] -1.33 0.38 -1.90 -0.72 4000 1
And now multilevel model with varying intercepts for district
m12H1.2 <- map2stan(alist(
use_contraception ~ dbinom(1,p),
logit(p) <- a_district[district_id],
a_district[district_id] ~ dnorm(a,sigma_d),
a ~ dnorm(0,5) ,
sigma_d ~ dcauchy(0, 1)),
data=d,
chains = 4)
##
## SAMPLING FOR MODEL 'use_contraception ~ dbinom(1, p)' NOW (CHAIN 1).
##
## Chain 1, Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1, Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 1, Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 1, Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 1, Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 1, Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 1, Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 1, Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 1, Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 1, Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 1, Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 1, Iteration: 2000 / 2000 [100%] (Sampling)
## Elapsed Time: 1.9619 seconds (Warm-up)
## 1.54993 seconds (Sampling)
## 3.51183 seconds (Total)
## The following numerical problems occured the indicated number of times after warmup on chain 1
## count
## Exception thrown at line 16: normal_log: Scale parameter is 0, but must be > 0! 1
## When a numerical problem occurs, the Hamiltonian proposal gets rejected.
## See http://mc-stan.org/misc/warnings.html#exception-hamiltonian-proposal-rejected
## If the number in the 'count' column is small, do not ask about this message on stan-users.
##
## SAMPLING FOR MODEL 'use_contraception ~ dbinom(1, p)' NOW (CHAIN 2).
##
## Chain 2, Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2, Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 2, Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 2, Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 2, Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 2, Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 2, Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 2, Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 2, Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 2, Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 2, Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 2, Iteration: 2000 / 2000 [100%] (Sampling)
## Elapsed Time: 1.99334 seconds (Warm-up)
## 1.46799 seconds (Sampling)
## 3.46133 seconds (Total)
##
##
## SAMPLING FOR MODEL 'use_contraception ~ dbinom(1, p)' NOW (CHAIN 3).
##
## Chain 3, Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 3, Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 3, Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 3, Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 3, Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 3, Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 3, Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 3, Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 3, Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 3, Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 3, Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 3, Iteration: 2000 / 2000 [100%] (Sampling)
## Elapsed Time: 2.05063 seconds (Warm-up)
## 1.6079 seconds (Sampling)
## 3.65853 seconds (Total)
##
##
## SAMPLING FOR MODEL 'use_contraception ~ dbinom(1, p)' NOW (CHAIN 4).
##
## Chain 4, Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 4, Iteration: 200 / 2000 [ 10%] (Warmup)
## Chain 4, Iteration: 400 / 2000 [ 20%] (Warmup)
## Chain 4, Iteration: 600 / 2000 [ 30%] (Warmup)
## Chain 4, Iteration: 800 / 2000 [ 40%] (Warmup)
## Chain 4, Iteration: 1000 / 2000 [ 50%] (Warmup)
## Chain 4, Iteration: 1001 / 2000 [ 50%] (Sampling)
## Chain 4, Iteration: 1200 / 2000 [ 60%] (Sampling)
## Chain 4, Iteration: 1400 / 2000 [ 70%] (Sampling)
## Chain 4, Iteration: 1600 / 2000 [ 80%] (Sampling)
## Chain 4, Iteration: 1800 / 2000 [ 90%] (Sampling)
## Chain 4, Iteration: 2000 / 2000 [100%] (Sampling)
## Elapsed Time: 2.07131 seconds (Warm-up)
## 1.57381 seconds (Sampling)
## 3.64512 seconds (Total)
##
##
## SAMPLING FOR MODEL 'use_contraception ~ dbinom(1, p)' NOW (CHAIN 1).
## WARNING: No variance estimation is
## performed for num_warmup < 20
##
##
## Chain 1, Iteration: 1 / 1 [100%] (Sampling)
## Elapsed Time: 7e-06 seconds (Warm-up)
## 0.000736 seconds (Sampling)
## 0.000743 seconds (Total)
## Computing WAIC
## Constructing posterior predictions
## [ 400 / 4000 ]
[ 800 / 4000 ]
[ 1200 / 4000 ]
[ 1600 / 4000 ]
[ 2000 / 4000 ]
[ 2400 / 4000 ]
[ 2800 / 4000 ]
[ 3200 / 4000 ]
[ 3600 / 4000 ]
[ 4000 / 4000 ]
plot(m12H1.2)
## Waiting to draw page 2 of 5
## Waiting to draw page 3 of 5
## Waiting to draw page 4 of 5
## Waiting to draw page 5 of 5
precis(m12H1.2, depth=2)
## Mean StdDev lower 0.89 upper 0.89 n_eff Rhat
## a_district[1] -1.00 0.20 -1.32 -0.68 4000 1
## a_district[2] -0.58 0.35 -1.12 0.00 4000 1
## a_district[3] -0.24 0.50 -1.06 0.53 4000 1
## a_district[4] -0.18 0.30 -0.65 0.31 4000 1
## a_district[5] -0.57 0.29 -1.02 -0.12 4000 1
## a_district[6] -0.81 0.24 -1.19 -0.44 4000 1
## a_district[7] -0.77 0.37 -1.38 -0.20 4000 1
## a_district[8] -0.51 0.28 -0.96 -0.05 4000 1
## a_district[9] -0.71 0.34 -1.25 -0.17 4000 1
## a_district[10] -1.15 0.43 -1.83 -0.47 4000 1
## a_district[11] -1.55 0.45 -2.24 -0.84 4000 1
## a_district[12] -0.62 0.32 -1.12 -0.10 4000 1
## a_district[13] -0.43 0.33 -0.93 0.10 4000 1
## a_district[14] 0.39 0.18 0.10 0.67 4000 1
## a_district[15] -0.56 0.34 -1.08 -0.03 4000 1
## a_district[16] -0.12 0.36 -0.75 0.38 4000 1
## a_district[17] -0.75 0.35 -1.31 -0.20 4000 1
## a_district[18] -0.63 0.27 -1.08 -0.20 4000 1
## a_district[19] -0.50 0.33 -1.00 0.03 4000 1
## a_district[20] -0.48 0.38 -1.07 0.14 4000 1
## a_district[21] -0.51 0.36 -1.08 0.06 4000 1
## a_district[22] -0.97 0.38 -1.56 -0.38 4000 1
## a_district[23] -0.76 0.38 -1.33 -0.12 4000 1
## a_district[24] -1.18 0.43 -1.84 -0.49 4000 1
## a_district[25] -0.28 0.22 -0.62 0.08 4000 1
## a_district[26] -0.52 0.39 -1.19 0.07 4000 1
## a_district[27] -1.19 0.31 -1.65 -0.69 4000 1
## a_district[28] -0.97 0.27 -1.43 -0.56 4000 1
## a_district[29] -0.80 0.31 -1.31 -0.33 4000 1
## a_district[30] -0.13 0.23 -0.48 0.25 4000 1
## a_district[31] -0.29 0.28 -0.75 0.17 4000 1
## a_district[32] -0.97 0.35 -1.54 -0.43 4000 1
## a_district[33] -0.42 0.38 -1.05 0.16 4000 1
## a_district[34] 0.28 0.30 -0.19 0.75 4000 1
## a_district[35] -0.13 0.26 -0.56 0.26 4000 1
## a_district[36] -0.59 0.36 -1.17 -0.04 4000 1
## a_district[37] -0.22 0.39 -0.87 0.38 4000 1
## a_district[38] -0.72 0.39 -1.32 -0.09 4000 1
## a_district[39] -0.20 0.31 -0.71 0.30 4000 1
## a_district[40] -0.26 0.27 -0.70 0.14 4000 1
## a_district[41] -0.21 0.32 -0.71 0.31 4000 1
## a_district[42] -0.23 0.41 -0.91 0.38 4000 1
## a_district[43] -0.04 0.26 -0.47 0.37 4000 1
## a_district[44] -0.96 0.33 -1.50 -0.45 4000 1
## a_district[45] -0.65 0.29 -1.13 -0.19 4000 1
## a_district[46] -0.01 0.19 -0.32 0.30 4000 1
## a_district[47] -0.34 0.37 -0.99 0.20 4000 1
## a_district[48] -0.07 0.27 -0.49 0.37 4000 1
## a_district[49] -0.87 0.49 -1.65 -0.10 4000 1
## a_district[50] -0.31 0.34 -0.85 0.24 4000 1
## a_district[51] -0.29 0.29 -0.73 0.19 4000 1
## a_district[52] -0.30 0.23 -0.67 0.09 4000 1
## a_district[53] -0.42 0.36 -0.95 0.16 4000 1
## a_district[54] -0.78 0.45 -1.49 -0.06 4000 1
## a_district[55] 0.10 0.27 -0.34 0.53 4000 1
## a_district[56] -1.07 0.34 -1.59 -0.50 4000 1
## a_district[57] -0.30 0.29 -0.76 0.17 4000 1
## a_district[58] -1.01 0.43 -1.69 -0.30 4000 1
## a_district[59] -1.00 0.31 -1.47 -0.48 4000 1
## a_district[60] -1.05 0.30 -1.50 -0.56 4000 1
## a -0.54 0.09 -0.68 -0.41 2415 1
## sigma_d 0.52 0.08 0.39 0.65 759 1
Now plot district ID vs proportion using contraception on vertical
library(reshape2)
pred.df <- data.frame(district_id= unique(d$district_id))
link.vary <- link(m12H1.2, data = pred.df, n= 4000)
## [ 400 / 4000 ]
[ 800 / 4000 ]
[ 1200 / 4000 ]
[ 1600 / 4000 ]
[ 2000 / 4000 ]
[ 2400 / 4000 ]
[ 2800 / 4000 ]
[ 3200 / 4000 ]
[ 3600 / 4000 ]
[ 4000 / 4000 ]
pred.df$est.vary.link <- apply(link.vary, 2, mean)
pred.df$est.vary.coef <- logistic(coef(m12H1.2)[1:60])
post.vary <- extract.samples(m12H1.2)$a_district
pred.df$est.vary.extract.samples <- logistic(apply(post.vary, 2, mean))
cor(pred.df[, 2:4])
## est.vary.link est.vary.coef
## est.vary.link 1.000000 0.999883
## est.vary.coef 0.999883 1.000000
## est.vary.extract.samples 0.999883 1.000000
## est.vary.extract.samples
## est.vary.link 0.999883
## est.vary.coef 1.000000
## est.vary.extract.samples 1.000000
not clear why we did this 3 differnet ways
plot.df <- data.frame(
district_id=1:60,
fixed=logistic(coef(m12H1.1)),
varying=logistic(coef(m12H1.2)[1:60]),
observed=tapply(d$use_contraception,d$district_id,function(x) sum(x)/length(x)))
plot.df.m <- melt(plot.df,id.var="district_id")
## Warning: attributes are not identical across measure variables; they will
## be dropped
pl <- ggplot(plot.df.m,aes(x=district_id,y=value,color=variable,shape=variable))
pl+geom_point(size=3)+geom_hline(yintercept = logistic(coef(m12H1.2)["a"]))
Shrinkage with the varying intercept model. In all cases the fixed model fits data better than varying model.
Trolley data
data("Trolley")
d <- Trolley
head(d)
## case response order id age male edu action intention
## 1 cfaqu 4 2 96;434 14 0 Middle School 0 0
## 2 cfbur 3 31 96;434 14 0 Middle School 0 0
## 3 cfrub 4 16 96;434 14 0 Middle School 0 0
## 4 cibox 3 32 96;434 14 0 Middle School 0 1
## 5 cibur 3 4 96;434 14 0 Middle School 0 1
## 6 cispe 3 9 96;434 14 0 Middle School 0 1
## contact story action2
## 1 1 aqu 1
## 2 1 bur 1
## 3 1 rub 1
## 4 1 box 1
## 5 1 bur 1
## 6 1 spe 1
summary(d)
## case response order id
## cfaqu : 331 Min. :1.000 Min. : 1.0 96;434 : 30
## cfbur : 331 1st Qu.:3.000 1st Qu.: 9.0 96;445 : 30
## cfrub : 331 Median :4.000 Median :16.5 96;451 : 30
## cibox : 331 Mean :4.199 Mean :16.5 96;456 : 30
## cibur : 331 3rd Qu.:6.000 3rd Qu.:24.0 96;458 : 30
## cispe : 331 Max. :7.000 Max. :32.0 96;466 : 30
## (Other):7944 (Other):9750
## age male edu
## Min. :10.00 Min. :0.000 Bachelor's Degree :3540
## 1st Qu.:26.00 1st Qu.:0.000 Some College :2460
## Median :36.00 Median :1.000 Master's Degree :1410
## Mean :37.49 Mean :0.574 Graduate Degree :1050
## 3rd Qu.:48.00 3rd Qu.:1.000 High School Graduate: 870
## Max. :72.00 Max. :1.000 Some High School : 420
## (Other) : 180
## action intention contact story
## Min. :0.0000 Min. :0.0000 Min. :0.0 box :1324
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0 bur :1324
## Median :0.0000 Median :0.0000 Median :0.0 spe : 993
## Mean :0.4333 Mean :0.4667 Mean :0.2 swi : 993
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0 aqu : 662
## Max. :1.0000 Max. :1.0000 Max. :1.0 boa : 662
## (Other):3972
## action2
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.6333
## 3rd Qu.:1.0000
## Max. :1.0000
##
Define and fit varying intercepts model for these data; cluster by id include action, intention, contact as ordinary terms This is an ordered categorical outcome; see p 339 for more help
#first, get rid of name 'case'
colnames(d)[1] <- "case_id"